On dangers of cross-validation in steganalysis

نویسنده

  • Jan Kodovský
چکیده

Modern steganalysis is a combination of a feature space design and a supervised binary classification. In this report, we assume that the feature space has been already constructed, i.e., the steganalyst has a set of training features and needs to train a binary classifier. Any machine learning tool can be used for this task and its parameters can be tuned through cross-validation, a standard automated model-selection procedure. However, classification problems arising in steganalysis have a very specific nature – individual training samples naturally form pairs of cover–stego feature vectors with opposite labels lying close to each other in the feature space. It is important to preserve these cover–stego pairs during cross-validation (prevent splitting each pair into different folds) otherwise the obtained error estimates may be misleading and lead to a suboptimal performance of the classifier. In this report, we demonstrate the sketched problem with cross-validation on a specific example of image steganalysis in the JPEG domain. As a classifier, we selected the support vector machine (SVM), a popular choice in steganalysis. In particular, we show that the implicit k-fold cross-validation as implemented in LIBSVM [2], a widely used implementaion of SVM, is not suitable for steganalysis and may result in a suboptimal performance and a striking discrepancy between the predicted and the real testing error. Instead of the implicit k-fold cross-validation, a steganalysis-aware cover-stego pair preserving cross-validation should be used. We stress that this is a steganalysis-specific issue and does not indicate any implementation flaw in LIBSVM. The issue with the standard cross-validation procedure in steganalysis has already been pointed out by Schwamberger and Franz in 2010 [14]. We believe, however, that the message may have been hidden to the reader in other experiments and conclusions presented in [14], as authors studied not only the cross-validation, but also different normalization techniques, and performed numerous experiments using different features and stego-algorithms. This technical report, on the other hand, is devoted solely to the problem of improper cross-validation. We go more in depth, provide explanation, and also study severity w.r.t. payload. Moreover, we point out a few examples of published work with results affected by the improper cross-validation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neighboring Joint Density and Markov Process Based Approach for JPEG Steganalysis

Steganalysis is the method used to detect the presence of any hidden message in a cover medium. A novel approach based on feature mining on the discrete cosine transform (DCT) domain, markov process based approach for modeling the difference JPEG 2-D arrays, machine learning for steganalysis of JPEG images which prevents cross validation is proposed. The neighboring joint density and absolute n...

متن کامل

A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features and its Performance Analysis

This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic functions of the test image is selected as features. The features are extracted from the...

متن کامل

Ensemble classification in steganalysis – Cross-validation and AdaBoost

Two alternative designs to the ensemble classifier proposed in [13] are studied in this report. First, the out-of-bag error estimation is replaced with crossvalidation. Second, we incorporate AdaBoost and modify the weights of the individual training samples as the training progresses. The final decision is formed as a weighted combination of individual predictions rather than through majority ...

متن کامل

On the Dangers of Cross-Validation. An Experimental Evaluation

Cross validation allows models to be tested using the full training set by means of repeated resampling; thus, maximizing the total number of points used for testing and potentially, helping to protect against overfitting. Improvements in computational power, recent reductions in the (computational) cost of classification algorithms, and the development of closed-form solutions (for performing ...

متن کامل

Steganalysis Method for LSB Replacement Based on Local Gradient of Image Histogram

In this paper we present a new accurate steganalysis method for the LSBreplacement steganography. The suggested method is based on the changes that occur in thehistogram of an image after the embedding of data. Every pair of neighboring bins of ahistogram are either inter-related or unrelated depending on whether embedding of a bit ofdata in the image could affect both bins or not. We show that...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011